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Abstract 

Background: Time-course gene expression experiments are useful tools for exploring biological processes. In this 
type of experiments, gene expression changes are monitored along time. Unfortunately, replication of time series is 
still costly and usually long time course do not have replicates. Many approaches have been proposed to deal with 
this data structure, but none of them in the field of pathway analysis. Pathway analyses have acquired great 
relevance for helping the interpretation of gene expression data. Several methods have been proposed to this aim: 
from the classical enrichment to the more complex topological analysis that gains power from the topology of the 
pathway. None of them were devised to identify temporal variations in time course data. 

Results: Here we present timeClip, a topology based pathway analysis specifically tailored to long time series 
without replicates. timeClip combines dimension reduction techniques and graph decomposition theory to 
explore and identify the portion of pathways that is most time-dependent. In the first step, timeClip selects the 
time-dependent pathways; in the second step, the most time dependent portions of these pathways are 
highlighted. We used timeClip on simulated data and on a benchmark dataset regarding mouse muscle 
regeneration model. Our approach shows good performance on different simulated settings. On the real dataset, 
we identify 76 time-dependent pathways, most of which known to be involved in the regeneration process. 
Focusing on the 'mTOR signaling pathway' we highlight the timing of key processes of the muscle regeneration: 
from the early pathway activation through growth factor signals to the late burst of protein production needed for 
the fiber regeneration. 

Conclusions: timeClip represents a new improvement in the field of time-dependent pathway analysis. It 
allows to isolate and dissect pathways characterized by time-dependent components. Furthermore, using 
timeClip on a mouse muscle regeneration dataset we were able to characterize the process of muscle fiber 
regeneration with its correct timing. 



Background 

Time course gene expression experiments are widely 
used to study the dynamics of biological processes. 
Usually, the main goal of such experiments is to identify 
genes modulated along a biological process or after a 
system perturbation (such as drug treatments or genetic 
modifications). However, time course data are costly 
and usually long time series have few or no replicates. 
In this context a differentially expressed gene can be 
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defined as a gene with the expression profile changing 
significantly along time and/or across multiple condi- 
tions. Several statistical models have been proposed to 
account for clusters and differential expression in the 
contest of time series with [1-18] and without replicates 
[10,19-21], but none of them were proposed in the con- 
text of pathway analysis. Pathway analysis has acquired 
great relevance in the last years especially for the ability 
to increase interpretability of gene expression results. 
Expression experiments typically provide lists of differ- 
entially expressed genes (DEGs) that represent the start- 
ing point for result interpretation. This step is not trivial 
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and remains challenging for this type of analysis. The 
grouping of genes into functionally related entities (such 
as pathways) is of great help in the interpretation of the 
results. Several methods have been proposed to this 
aim, based on very different statistical tests and null 
hypotheses [22,23]. Broadly speaking, they can be 
divided into the classical enrichment analysis [24-28], 
working on gene lists selected through a gene-level test, 
and the novel global and multivariate approaches 
[29-37], that define a model for the whole gene set (see 
[22,38-40] for a comprehensive reviews and comparative 
analysis). The latter can be further divided into 'topolo- 
gical' and 'non-topological' methods according to their 
ability to gain power from the topology of the pathway 
[25,35,36,41-43]. 

A pathway is a complex structure comprising chemical 
compounds mediating interactions and different types of 
gene groups (e.g. protein complexes or gene families) 
that are usually represented as single nodes but whose 
measures are not available using gene expression data. 
However, after appropriate biologically-driven conver- 
sion [44,45], a biological pathway can be represented as 
a graph where genes and their interactions are, respec- 
tively, nodes and edges of the graph. 

Taking advantage of the structure of the graph, Massa 
et al. [35] used Gaussian graphical model theory to test 
both differences in mean and in covariance matrices 
between two experimental conditions. In particular, gra- 
phical models are useful to decompose the overall graph 
(obtained from a pathway) into smaller components (cli- 
ques), that can be explored and tested in detail. Martini et 
al. [36] proposed an extension of this method, called 
CliPPER, based on a two-step empirical approach. In the 
first step, it selects pathways with covariance matrices 
and/or means significantly different between experimental 
conditions dealing with the p >> n case; in the second 
step, it identifies the sub-paths (called signal paths) most 
associated with the phenotype. 

Pathway analysis is mainly tailored to two-groups 
comparisons and few efforts have been dedicated to the 
time course design. Here, we propose a modification of 
[36], called timeClip, to deal with long time course 
data without replicates. Specifically, timeClip com- 
bines principal component analysis, regression models 
and graph decomposition to explore temporal variations 
across and within pathways. Moreover, timeClip 
implements an easy and effective visualization of the 
dynamics of the pathways. 

On simulated datasets, timeClip shows good perfor- 
mances in term of power, specificity and sensitivity. 
Using real data on mouse muscle regeneration [46], we 
obtain excellent results in agreement with the scientific 
literature. 



Method 

Pathway annotation 

A critical step in the field of topology based pathway ana- 
lyses is the availability and the quality of the pathway 
topology. Our group has recently developed graphite a 
Bioconductor package for the storage, interpretation and 
conversion of pathway topology to gene-only networks 
[44]. graphite discriminates between different types of 
biological gene groups and propagates gene connections 
through chemical compounds. Specifically, protein com- 
plexes are expanded into a clique (all proteins connected 
to the others), while the gene families are expanded with- 
out connections among them; see [44,45] for more details. 
The current version of graphite Bioconductor package 
is limited to human, so here we build a dedicated 
graphite package for mouse KEGG pathways. This 
package is available at http://romualdi.bio.unipd.it/ 
wp-uploads/2013/10/graphite.mmusculus_0.99.2.tar.gz. 

timeClip: general approach 

A pathway is composed by multiple genes so to reduce 
the dimension of a whole or of a portion of a pathway, 
we used principal component analysis. Then the first 
principal component is explored for temporal variation. 
A vast amount of techniques exist for analyzing regu- 
larly sampled time series. Unfortunately, the irregular 
sampling of the values (a common practice in biology) 
makes direct use of such estimation techniques impossi- 
ble. To avoid the well known biases associated with the 
most common approach for irregularly sampled time 
series based on transforming unevenly- spaced data into 
equally spaced observations using some form of interpo- 
lation, here we propose to use a regression model com- 
bining a polynomial trend and a continuous-time 
Gaussian autoregressive process of order 1 (AR(1)). 
Then, timeClip resembles the two-steps approach of 
CliPPER. In the first step, the whole pathway is 
explored for its temporal variation. If the pathway is 
defined as time-dependent, in the second step, timeClip 
decomposes the pathway into a junction tree and high- 
lights the portion mostly dependent on time. A general 
schema of the approach is summarized in Figure 1. 
Step 1: exploring the whole pathway 

Let X n ,< t be the normalized log transformed gene 
expression matrix with genes on the rows and experi- 
ments (equal to time points t) on the columns. Let Xp xt 
the sub-matrix of genes belonging to pathway P. Path- 
way P has p genes. Then, on the transpose of X , X , 
we perform principal component analysis (PCA). We 
used both the classical (R package stats) and the robust 
(rrcov R package) version of PCA. Let Z? x f be the scores 
matrix and L^ x f the loadings matrix. We call 7F X , ■ ■ ■ , Zf 
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Figure 1 timeClip. Global overview of timeClip approach. 



the p principal components. In this way, the first PCs 
summarize the temporal variation of the genes in path- 
way P (if present). Thus, from now on we will indicate 7E 
as Z\ (t). A similar approach was recently proposed by 
[15] (PCA-maSigFun). PCA-maSigFun uses principal 
component analysis to identify temporally- homogeneous 
groups of gene within the pathway. 

Then, for irregularly sampled time series we assume that 
our irregularly sampled signal Zf (t) can be decomposed 
as Z(t) = p{t) + £(?), where pit) is a deterministic function, 
hereafter called "trend", and G(t) is the realization of a 



stationary stochastic process with mean zero. Extensive 
exploratory analysis suggests that a reasonable choice for 
the trend component is a polynomial of degree 2 in t, i.e., 

p(t) = p Q + p l t + p 1 t 2 

with Pi capturing existing temporal behaviors of Z?(t) 
and f} 2 correcting for potential non linearities. 

Moreover, we assume that G(t) follows a continuous- 
time Gaussian autoregressive process of order 1. The 
model is fitted using generalized least squared (as imple- 
mented in nlme R package). The representative p - value 
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of pathway P, p P , is then taken to be the p - value of the 
test of nullity of fii (obtained by a t-test as implemented in 
the gls function of the nlme R package). Bonferroni cor- 
rection is used to adjust p - values for multiple tests. 

We evaluated the possibility to fit a polynomial regres- 
sion not only on the first PC, but also on few additional 
7F it with i = 2, 3. However, we did not find significant 
improvements in the final list of significant time-dependent 
pathways (data not shown). 
Step 2: decomposing the pathway 

Pathways declared as time-dependent in step 1 are then 
moralized, triangulated and decomposed into a junction 
tree as described in [36]. 

Briefly, moralization inserts an undirected edge between 
two nodes that have a child in common and then elimi- 
nates directions on the edges; triangulation inserts edges 
in the moralized graph so that in the moralized graph all 
cycles of size > 4 have chords, where a chord is defined as 
an edge connecting two non-adjacent nodes of a cycle. A 
clique in the triangulated graph is a complete subgraphs 
having all its vertices joined by an edge while a junction 
tree construction is a hyper-tree having cliques as nodes 
and satisfying the running intersection property according 
to which, for any cliques Q and C 2 in the tree, every cli- 
que on the path connecting Ci and C 2 contains Cj n C 2 
[36,47]. For a given graph there could be more than one 
junction tree. Here we force the root of the junction tree 
to be in agreement with the structure of the pathway. 

A clique k of pathway P, noted as C£ (with k = 1,..., K), 
is composed by a subset of genes in P, c£. Let X p ck be the 
sub-matrix of X corresponding to the genes of the cli- 
que C|°. For each clique k of P we apply the same 
approach as described in step 1: PCA transformation 
and then a linear model with polynomial trend and 
autoregressive process of order 1 on the first PCs. The 
p- value of clique k in pathway P, Pc\] is given by the 
p of the Pi of the polynomial regression. Finally, the 
best time-dependent paths within a pathway P, hereafter 
called Spj, = 1,..., J, are identified using the relevance 
measure as described in [36]. Briefly, a path is a chain of 
consecutive time-dependent cliques (pc£ — 0.05) with 
gaps at most of size one. Then, for each path in the 
pathway a cumulative score is calculated along the path: 
lower the the p - value of a clique in the path, higher 
the contribution to the score, in case of gap the score is 
penalized. The final score of a path is the maximum 
value reached by the score along the path. Then, the 
score is normalized for the path length; this quantity is 
called relevance [36]. 

As final results, for each time-dependent pathway, we 
report a list of relevant paths, ranked according to their 
relevance. Currently, step 2 is the most innovative feature 
of timed ip and, as far as we known, there are no 
existing tools using a similar strategy. 



Simulated data 

As some paths may be declared time-dependent by 
timeClip step 2 simply as a consequence of type I 
errors in timeClip step 1, we used a simulation to 
evaluate the percentage of false positives under the null 
hypothesis and to estimate the statistical power in differ- 
ent scenarios. 

False positive rate estimation 

Given a pathway P and its graph structure (G), for 1,000 
runs we randomly generate a gene expression matrix X n x t 
from a multivariate normal distribution with zero mean 
and variance I, with e S+ ( G ) ( where ^(G) is ^ set of 
symmetric positive definite matrices with null elements 
corresponding to the missing edges of G). In this case, 
gene expression profiles are time independent. Then, for 
each run we calculate p P (either for the case of irregularly 
and regularly sampled time points, see Section Step 1: 
exploring the whole pathway). Under this scenario, at the 
nominal level a = 0.05 we expect a number of rejections 
around 5%. We repeat the simulation for different values 
of n (n = 5, 10, 15, 20, 25, 30) and t (t = 5, 10, 15, 20, 30). 
Power estimation 

In order to be sure that the model were able to identify 
time-dependency coming from different models, we simu- 
late data using polynomial models, autoregressive models 
of order 1 and a combination of both (polynomial models 
with autocorrelated errors). Then, the power is estimated 
for irregularly and regularly sampled time points. 

Given a pathway P and its graph structure (G), for 
1,000 runs we randomly generate a gene expression 
matrix X( n _ s ) x t from a multivariate normal distribution 
with zero mean and variance E with e S + (G). 
Then, the expression profiles of the remaining s genes, 
with s < n are simulated to have different degree of 
time-dependency. Specifically, we use polynomial models 
(Equation 1), autoregressive models of order 1 (Equation 
2, where G* is a white noise) and the combination of 
both (Equation 3, where G an AR(1)). 

x s (t) = a Q + ait + a 2 t 2 + 6* (1) 
x s [t) = <po +<PiX s (t- 1) + 6* (2) 

X s (i) = a 0 + ait + a 2 t 2 + <pie s (t - 1) (3) 

The coefficients or are independently generated from a U 
(-5, 5), and (pi are generated so as to achieve stationarity. In 
this way, we simulate expression profiles with different 
degrees of temporal variations. Then, for each run we cal- 
culate p P (see Section Step 1: exploring the whole pathway). 
Under this scenario, the number of rejection estimates the 
statistical power. We repeat the simulation for different 
combinations of <j>, n, s and t. 
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Real data: Muscle regeneration model 

The benchmark dataset used [46] (GSE469) follows 
mouse muscle regeneration after intra-muscluar injec- 
tion of cardiotoxin. Regeneration process is followed for 
27 unevenly spaced time-points with only two technical 
replicates for each time-point. Expression data were pro- 
duced using single channel Affimetrix microarrays. The 
probes in the platform were annotated with EntrezGene 
custom CDF (version 14) [48] and data was normalized 
using the robust multi array analysis (rma) and quantile 
normalization. Then, technical replicates were averaged 
to get one measure for every time-point. 

Implementation and Visualization: the wheel of time 

timeClip is implemented as an R package available from 
the authors. The package allows to analyze equally and 
non-equally spaced time series according to the user set- 
ting. To get better insights into the temporal activation of 
the different portions of the pathway, we develop a new 
way of visualization using Cytoscape software [49] and 
Rcytoscape Bioconductor package. The visualization, 
called the wheel of time, allows visualizing pie charts 
inside network nodes. For each pathway, timeClip 
exports in Cytoscape the structure of the junction tree 
where each time-dependent clique has a pie chart that 
represents the time trend. Specifically, the pie is divided 
into as many slices as the number of time points in the 
dataset. Each slice in the pie is colored (from green to red) 
according to the scores of first principal component: the 
higher the value, the stronger the activation of a clique in 
a specific time point (red color) and viceversa (green). 

Results and discussion 

Many biological processes need to be followed and 
monitored along time. In these cases time course 
designs are ideals: higher the number of time points, fin- 
est the monitoring process. However, long time courses 
are often characterized by small or no replicates. Here, 
we present timeClip, a two-step approach to perform 
topological pathway analysis for time course gene 
expression data, specifically tailored to long time series 
without replicates (Figure 1). In the first step, we select 
pathways that show time dependency. In the second 
step, the selected pathways are decomposed into cliques 
and the time-dependent portions are isolated. In the 
next sections, we will show the performance of time- 
Clip using simulated and real datasets. 

Simulations results 

Two simulation strategies have been considered. The first 
one was designed to estimate the number of false posi- 
tives under the null hypothesis of no temporal variation, 
the second to estimate the statistical power (see section 
method for details). 



Table 1 Simulation results - False positives rate with 
different pathway dimensions n and irregularly sampled 
time points t. 
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Table 1 and Table SI (Additional file 1) report the per- 
centage of false positives obtained with different n and t 
for the irregularly and regularly sampled time points, 
respectively. The average false positive percentage for 
each t and n is always limited to -4-5%, with the excep- 
tion of small time series (t = 5) and equally spaced time 
points where it is slightly higher. Thus, we can conclude 
that, in general, for long time series we have an excellent 
control of type I error even with exceptionally low sam- 
ple sizes. 

Table 2 and Table S2 (Additional file 1) report the 
number of true positives obtained with n = 30 and differ- 
ent t and 5 for equally and not-equally spaced time points 
respectively. Here, the genes with temporal variation are 
simulated using different models (if s is the number of 
time-dependent genes among the n of the pathway, we 
simulate s/3 with polynomial, s/3 with AR(1) and s/3 
with the combination of both). As expected, the power 
increases with the increase of t and 5: the longer the time 
course and the higher the number of time dependent 
genes s within the pathway, the higher the power. 

Specifically, when the time course is short (t = 10 - 20) 
the maximum power reaches 60%, while with long time 
series t = 30 the power is above 80%. Moreover, it is worth 
noting that the increase of the time dependent genes does 
not affect significantly the power level. The greater impact 
that the number of time points has on statistical power 
with respect to the number of time-depending genes can 
be explained by the presence of two steps in our strategy: 
i) a data reduction step (with PCA on genes within 



Table 2 Simulation results - Power estimate in case of 
n = 30 and different time course length t and time 
dependent genes s. 
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Irregularly sampled time points. 
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pathways) and ii) a model-fitting step of the reduced vari- 
ables on time points. PCA is an efficient method to detect 
variance components in the data. Thus, even in case of a 
small number of time-dependent genes, the first PC is able 
to capture the time trend when present. On the other 
hand, once the trend is captured, the goodness of fit of the 
regression model increases by increasing the number of 
time points. The use of robust PCA does not change the 
performance of the method substantially (data not shown). 

Case study: muscle regeneration model 
Step 1 results 

In step 1 every pathway is explored for its temporal 
dependence. In the benchmark dataset, we have to deal 
with 27 not equally spaced times (14 of which are 
equally spaced). 

Comparing step 1 results for equally and not equally 
spaced time-point we obtain an overlap of 70%. This 
high degree of overlap makes us confident about the 
reliability of our approach. We summarized the results 
in the heat map of Figure 2 (values reported in Addi- 
tional file 2). The heat map is obtained using the scores 
of the first principal component of each time-dependent 
pathway. From the unsupervised cluster analysis, we can 
define 3 pathway groups characterized respectively by a 
'very early', 'early-intermediate' and 'intermediate-late' 
activation. Pathways characterized by a very early activa- 
tion like 'Malaria' and 'African trypanosomiasis' reflect 
the early activation of the inflammation processes 
deputed to clean injured fibers. These processes are car- 
ried-out by macrophages that have a central role in the 
'Malaria' and 'Africa trypanosomiasis' pathways. Macro- 
phages clean up injured fiber and release growth factors 
like vascular endothelial growth factor (VEGF) and 
hepatocyte growth factor (HGF) [50]. 

In the early-intermediate pathway group, we can see 
the effects of the early signal secretion: in fact, the 
group contains pathways like 'mTOR signaling pathway', 
'VEGF signaling pathway', 'Insulin signaling pathway' 
and other metabolic pathways like 'Ether lipid metabo- 
lism' and 'Citrate cycle (TCA cycle)'. Globally, these 
pathways indicate that the regeneration progress has 
begun. 

'mTOR signaling pathway', probably the most impor- 
tant pathway in the muscle regeneration, on one side sus- 
tains VEGF signaling and on the other promotes protein 
production needed for clonal expansion of the myoblasts, 
their growth and fusion. In particular, mTOR integrates 
growth factor signaling with a variety of signals from 
nutrients (amino acids metabolism activate mTOR path- 
way) and cellular energy status [51]. The energy status of 
the cell is indeed monitored by those pathways involved 
in energy metabolism like 'carbohydrate digestion and 



adsorption', 'Citrate cycle (TCA cycle)' and 'Fatty acid 
metabolism'. These processes are very important in the 
regeneration process, in fact, it was demonstrated that 
glycolitic metabolism is restored after three days from 
myofibril formation [52]. 

Intermediate-late activation pathways mainly present 
pathways involved in inflammatory responses like 'B and 
T Cell receptor signaling pathway', 'Toll-like receptor 
signaling pathway', Adipocytokine signaling pathway' 
and 'Leukocyte transendothelial migration'. Recent dis- 
coveries reveal complex interactions between skeletal 
muscle and the immune system that regulate all phases 
of the muscle regeneration [50]. Moreover in this path- 
way group there is the 'Axon guidance' and 'Dopaminer- 
gic synapse' pathways that are involved in nervous 
impulse transduction. We can speculate that at the end 
of the regenerative processes nervous system can con- 
tact the restored contractile cells to ensure and maintain 
their functionality. 

This contains also pathways involved in signaling trans- 
duction like 'HIF-1 signaling pathway'. HIF-1 has been 
recently demonstrated to be essential for skeletal muscle 
regeneration in mice [53]. In fact this pathway manages a 
plethora of signals and interface with pathways like 
mTOR signaling pathway, PI3K-Akt signaling pathway, 
MAPK signaling pathway, Citrate cycle (TCA scycle), 
Calcium signaling pathway, VEGF signaling pathway and 
Ubiquitin mediated proteolysis. Together with all these 
pathways, 'HIF-1 signaling pathway' finely tune the bal- 
ance between oxygen consumption. 

In step 1, we are able to see only the strongest signals 
and not always the pathway name alone reflects the 
activity of the pathway. To tackle the complexity of the 
pathway, timeClip step 2 deeply investigates the tim- 
ing activation of different portion of the pathway. 
Step 2 results 

In the second step, we focused on the the Akt-mamma- 
lian target of rapamycin (mTOR) signaling pathway. It 
regulates a pletora of signals: cell growth, VEGF signaling 
pathway, autophagy and its action is related to other 
pathways known to be involved in the muscle regenera- 
tion like Insulin signaling pathway and MAPK signaling 
pathway [54]. 

The junction tree of mTOR signaling pathway (Figure 
3A) starts with Igfl (Insulin-like growth factor 1) as 
represented in the KEGG map (Figure 3B). Within 
mTOR signaling pathway we identified a total of 6 paths, 
ranked by their relevance score (Table 3). 

The most relevant of these paths goes from the 1 st to the 
21 5f clique and contains 16 cliques. The second and the 
third path share a big portion with the first one. This big 
portion goes from clique 1 to cliques 13 (blue nodes on 
the junction tree - Figure 3A) and contains genes like Igfl, 
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Figure 2 Heat map of pathway PCs. Heat map colored according to the expression of the first PCs from green to red. According to the color 
pattern, pathways are divided in early, early-intermediate and late-intermediate. Time is measured in hours after treatment. 



Insulin, Mapk3, Mtor and Akt that globally represent the 
backbone of the pathway where the starting activating sig- 
nal is regulated by IgfL Then Pi3k, Mapk and Akt trans- 
late the signal and activate Mtor that organize the 



effectors. From the junction tree we can identify three dif- 
ferent terminal effectors: the first, in pink, is the portion 
that brings to the VEGF signaling pathway. The second, in 
purple, is the regulation of autophagy and the third, in 
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Figure 3 Activation of the mTOR signaling pathway. Panel A. Junction tree of the mTOR signaling pathway (using graphite R package and 
database KEGG). The top ranked time-dependent paths identified in timeClip step 2 are highlighted using the wheel of time visualization. 
Panel B. KEGG representation of mTOR signaling pathway. Genes are colored according to the paths in panel A. Panel C. Enlargement of the 
wheels of time representative of the main block of mTOR signaling pathway: from f 0 to f 27 (clock-wise) every slice of the pie is colored 
according to the value of the clique first PC (green means no activation; red means activation). 



yellow, is the regulation of protein synthesis that is neces- 
sary for the skeletal muscle mass recovery during regener- 
ating processes [55]. In the panel C of Figure 3, we 
summarized the timing of the 'mTOR signaling pathway' 
activation. With the wheel of time, we can see that the 
pathway backbone is activated in the early phases. The 
portion that brings to the VEGF signaling pathway is acti- 
vated in the late phases. The effectors that bring to autho- 
phagy are switched off at the end of the regenerative 



Table 3 mTOR signaling pathway: relevant paths 
identified by timeClip step 2 

path starting 
clique 

21 1 

23 1 

16 1 

12 1 

1;4 1 

25:26 25 



ending 
clique 


lenght 


Relevance 


average 
Relevance 


21 


16 


102.11 


6.38 


23 


13 


74.39 


5.72 


16 


12 


67.79 


5.65 


12 


5 


27.43 


5.49 


4 


4 


26.01 


6.5 


26 


1 


1.9 


1.9 



precess while the activation of the protein synthesis begun 
from the early-intermediate phases and last till the end of 
the process. 

Recently, as discussed before, it was demonstrated the 
involvement of HIF-1 in the skeletal muscle regeneration 
process [53]. We observed that the most relevant path of 
HIF-1 signaling pathway is 37 cliques long underlining its 
importance in this process. This path is activated by differ- 
ent growth factors (Igf, Ins, Egf) and signals are translated 
through Akt and mTOR towards Hl¥-la/p. Hif-la regu- 
lated many processes from the oxygen balance to apopto- 
sis (See Additional file 3). Such downstream effectors 
confirm its importance in skeletal muscle regeneration in 
accordance with results obtained from [53]. 

Comparison with other methods 

In this section we compare timeClip step 1 results with 
the methods proposed by [15]. Step 2, that is the most 
innovative feature of timeClip, cannot be compared to 
any existing tool. [15] proposed two different strategies. 
The first one, called maSigFun, considers individual 
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genes as different observations of the expression profile 
of the pathway. The second approach PCA-maSigFun 
uses PCA to identify groups of genes showing different 
time-dependencies. maSigFun did not give any signifi- 
cant time-dependent pathway using our dataset describing 
skeletal muscle regeneration (p < 0.05), while PCA-maSig- 
Fun returned 59 significant KEGG pathways (p < 0.05). 26 
out of 59 (44%) pathways are in common with time Clip 
step 1 results. Indeed, both the methods retrieve mTOR 
signaling pathway, however PCA-maSigFun did not call 
HIF signaling pathway as significant, although it seems to 
be closely related to the muscle regeneration [53]. Most of 
the PCA-maSigFun specific pathways (15 out of 33) 
referred to metabolic processes like Inositol phosphate 
methabolism, Pyruvate metabolism, Tyrosine metabolism, 
Glycerolipid metabolism. The remaining pathways are 
highly heterogeneous and comprise Acute myeloid leuke- 
mia, Bladder cancer, Melanoma, Pancreatic cancer. 

Conclusions 

Pathway analysis is a useful and widely used statistical 
approach to test groups of genes between two or more 
biological conditions. Although many efforts have been 
dedicated to implement novel gene set analysis in a 
multivariate and topological contexts, few of them deal 
with time course experiments. Time course experiments 
are used to monitor the dynamics of biological processes 
under physiological conditions or after perturbations. 

In this context there is a clear trade-off between the 
number of time points and the number of replicates. In 
general, if the goal of the study is the identification of time- 
dependency, long time course are required at the expense 
of replicates; on the other hand, if the goal is the character- 
ization of short term response a large number of replicates 
for each time point is required to increase statistical power. 
In general, there are few long time series datasets and in 
our opinion this is pardy due to the experimental costs but 
also to the lack of effective methods to study and interpret 
results. Here, we present timeClip, an empirical two- 
step approach specifically tailored to long time course gene 
expression data without replicates. Using simulated data 
timeClip shows good performance in terms of control- 
ling type I error and power. Furthermore, we successfully 
identify most of the key pathways involved in the early, 
middle and late phases of the skeletal muscle regeneration 
process. A visualization tool has also been implemented to 
taclde the dynamics of the transcriptome. 

Additional material 



Additional file 1: Additional tables. This file contains additional 
tables mention on the text (pdf format) 

Additional file 2: Figure 2 Heat map values. This file contains the 
values used to create the heat map in Figure 2. In column 2 the 



pathways that are called significant also by PCA-maSigFun with an 
alpha <= 0.05 are marked with "*". In addition p - values and 
adjusted p - values (Bonferroni) for timeClip are show in col 3 and 
4 (tab delimited format) 

Additional file 3: Activation of the HIF-1 signaling pathway. KEGG 
representation of HIF-1 signaling pathway. Genes of the 37 clique 
long path colored in cyan 
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